Heart Attack Analysis & Prediction Project

Project goal: to perform exploratory data analysis on heart attack data and predict the chance of getting heart attack. Dataset avaliable on Kaggle: https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset?select=heart.csv

Data Dictionary

Target: heart attack (0 - less chance of heart attack, 1 - more chance of heart attack)

Age: age of the patient, Sex: sex of the patient (0 - Female, 1- Male), cp: chest pain type chest pain type (0 - typical angina, 1 - atypical angina, 2 - non-anginal pain, 3 - asymptomatic), trtbps: resting blood pressure (in mm Hg), chol: cholestoral in mg/dl fetched via BMI sensor, fbs: (fasting blood sugar > 120 mg/dl) (1 - true, 0 - false), restecg: resting electrocardiographic results (0 - normal, 1 - having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), 2 - showing probable or definite left ventricular hypertrophy by Estes' criteria), thall: thalium stress test result (0 - NA, 1 - fixed defect, 2 - normal blood flow, 3 - reversible defect), thalach: maximum heart rate achieved (bpm), exang: exercise induced angina (1 - yes; 0 - no), oldpeak: previous peak value, slp: slope (0 - downsloping, 1 - flat, 2 - upsloping), caa: number of major vessels (0-3), o2saturation: oxygen saturation level

Packages import

Data

EDA

Correlation matrix

It can be seen that there is not any strong correlation between the target (heart_attack) and heart attack variables. The variables that correlate the most are: positively: cest pain (cp, corr coef: 0.43), max heart rate (corr coef: 0.42) negatively: angina (corr coef: -0.44), old peak (corr coef: -0.43)

Among heart attack features, the highest correlation can be noted between slope (slp) and old peak (corr coef: -0.58)

Categorical features distribution and analysis

207 - Males and 96 - Females. The number of males is more than twice the number of females.

138 persons have less chance of heart attack 165 persons have more chance of heart attack

Most of the patients (104) with less chance of heart attack have chest pain type: typical angina. On the contrary, patients with more chance of heart attack mostly have non-anginal pain.

Patients with less chances of heart attack in most cases have flat slope of the peak exercise ST segment, whereas patients with more chances of heart attack usually have upsloping slope.

Patients with ST-T wave abnormality of resting electrocardiographic have higher chance of heart attack.

Patients with normal thallium stress test results have much higher chance of heart attack than others.

According to the histogram, for the majority of people from both groups (with high and low chance of heart attack) fasting blood sugar value is less than 120 mg/dl

Patients with no exercise induced angina have higher chance of heart attack.

Patients with number of major vessels equals to 0 have much higher chance of heart attack

Continuous features

People between the ages of 50-55 have more chances of heart attack than people in their early 60s.

Boxplots (Outliers detection) and continuous features distribution according to target variable

People with lower value of previous (old) peak have higher chances of heart attack.

People with lower maximum heart rate have less chance of getting a heart attack. For the most people with higher chance of heart attack the max heart rate is between 150 - 180 bpm.

For both groups (with high and low chance of getting heart attack) the blood pressure and cholestoral values distribution is almost the same.

For both groups (with high and low chance of getting heart attack) the oxygen saturation level in most cases equals to 97.5

According to the plot, it can be seen that:

According to the plot, it can be seen that:

Prediction Models

Setting target variable and predictors

Data Scaling

Splitting data to test and train

SVC model (Linear Kernel)

KNN model

Logistic Regression

Among three implemented models, SVC, KNN and LR, the highest accuracy of 89.5% was achived with SVM classifier.